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SAMPLING  AND  MEASUREMENT  ERROR 
PART  1:  SAMPLING  ERRORS 


Purpose  and  Overview 

The  intent  of  this  two-part  "statistical  primer"  is  to 
provide  the  reader  with  an  understanding  of  basic  con- 
cepts of  errors  in  estimates.  Sampling  a  population  in 
order  to  estimate  population  characteristics  may  result  in 
estimation  errors.  It  is  less  widely  recognized  that  mea- 
sures based  on  a  "complete  count"  may  also  have  a 
substantial  error  component.  This  presentation  should 
provide  the  reader  with  the  background  needed  to 
interpret  and  evaluate  rates  and  other  health-related 
measures,  and  to  understand  basic  discussions  of  mea- 
surement error  in  statistical  papers.  Discussions  of  ran- 
dom vs.  nonrandom  and  sampling  vs.  nonsampling 
errors  are  first  presented.  An  explanation  of  random 
sampling  error  leads  to  the  development  of  formulas  for 
quantifying  the  sampling  errors  of  means  and  propor- 
tions. In  addition,  a  guide  to  carrying  out  simple  random 
sampling  procedures  is  provided,  which  could  be  used 
to  gather  needed  information  in  a  practical  setting  such 
as  the  local  health  department.  In  Part  2,  the  estimation 
of  the  error  involved  in  vital  and  other  rates  based  on  a 
complete  enumeration  of  events  is  described,  and  a 
method  for  determining  the  statistical  significance  of  a 
difference  between  two  rates  is  presented. 

Random  vs.  Nonrandom  Errors 

Most  assessments  of  sampling  or  measurement  varia- 
bility are  based  on  the  concept  of  random  errors. 
Repeated  estimates  may  overstate  or  understate  the 
underlying  or  "true"  value  of  interest,  but  it  is  usually 
assumed  that  the  errors  of  measurement  are  random  in 
nature  and  correspond  to  a  known  distribution.  For 
example,  it  is  commonly  assumed  that  repeated  esti- 
mates of  a  population  characteristic  would  form  a  bell- 
shaped  curve  or  normal  distribution  if  plotted  on  paper 
(see  Figure  1 ).  This  would  be  true  whether  the  character- 
istic were  estimated  through  samples  from  a  population 
or  through  quantifications  based  on  all  elements  of  the 
population,  though  in  the  case  of  a  sample  the  variability 
from  one  estimate  to  the  next  will  usually  be  greater. 

FIGURE  1: 
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In  the  physical  sciences,  as  well  as  in  the  less  precise 
health  and  social  sciences,  errors  of  measurement  are 
important  to  consider.  Even  the  simple  measurements  of 
the  length  of  a  rod  are  found  to  vary.  The  "error"  of  such 
individual  measurement  is  sometimes  positive  and  some- 
times negative,  but  the  average  of  a  number  of  repeated 
measurements  usually  becomes  more  and  more  steady 
as  the  number  of  measurements  increases,  and  con- 
verges on  the  "true"  value  even  though  the  magnitude 
of  error  in  a  single  measurement  is  unknown.  Thus  the 
concept  of  probability  has  become  an  important  part  of 
the  theory  of  scientific  measurement.  For  most  of  human 
history  absolute  certainty  has  been  considered  an  essen- 
tial condition  for  genuine  knowledge.  Recently,  how- 
ever,"in  the  whole  field  of  science  the  deductive- 
mathematical  process  of  absolutely  certain  inference  is 
being  replaced  by  the  probabilistic-statistical  method  of 
uncertain  inference"  (7).  If  science  is  based  on  observa- 
tion and  measurement  and  if  each  set  of  measurements 
is  a  statistical  sample,  then  all  scientific  conclusions  are 
subjecttoa  margin  of  uncertainty.  It  is  the  job  of  statistics 
to  quantitatively  estimate  the  degree  of  this  uncertainty. 

In  some  cases  there  may  be  a  consistency  of  errors 
leading  to  a  definite  bias.  In  these  cases  of  nonrandom 
errors  the  usual  statistical  tools,  which  depend  on  the 
concept  of  randomness,  are  of  limited  help.  It  is  still 
important,  however,  to  consider  possible  measurement 
biases  since  these  errors  can  undermine  the  best  efforts 
to  assess  and  control  the  random  error  component. 

Sampling  vs.  Nonsampling  Errors 

Taking  a  sample  or  subset  of  a  population  in  order  to 
estimate  a  characteristic  of  the  whole  population  will 
involve  some  degree  of  sampling  error,  because  the 
characteristic  is  estimated  from  a  population  subset 
rather  than  from  the  entire  population.  If  a  sample  is 
drawn  in  a  random  fashion,  probability  theory  enables  us 
to  evaluate  the  likely  degree  of  sampling  errors,  i.e., 
those  errors  introduced  because  samples  vary  from  one 
to  the  next.  Nonsampling  errors,  on  the  other  hand,  are 
errors  of  measurement  and  may  be  more  or  less  random, 
or  they  may  be  systematic  errors  leading  to  a  bias  as  was 
mentioned  above.  There  is  nothing  to  be  gained  in 
reducing  sampling  errors  below  a  certain  point  as  com- 
pared with  nonsampling  errors,  if  nonsampling  mistakes 
such  as  response  or  interviewing  errors  are  large,  there  is 
no  point  in  taking  a  huge  sample  in  order  to  reduce  the 
sampling  error  component  since  the  total  error  will  still 
be  large.  (2)  Though  this  paper  deals  principally  with  the 
question  of  sampling  error,  it  is  very  important  to  be 
aware  of  and  attempt  to  reduce  the  nonsampling  errors 
that  will  occur  in  a  sample  survey  project. 


A  sampling  approach  to  measuring  a  population  char- 
acteristic of  interest  may  produce  a  more  accurate  esti- 
mate than  a  census  or  complete  enumeration.  A  sample 
survey  requires  a  much  smaller  staff  than  a  complete 
enumeration  and  it  is  therefore  possible  to  employ  bet- 
ter trained  people  with  higher  pay,  and  also  to  maintain 
closer  supervision  of  their  work.  If  the  resulting  reduc- 
tion of  nonsampling  errors  is  greater  than  the  amount  of 
sampling  error  that  would  be  introduced,  estimates  pro- 
duced from  a  sample  survey  will  be  more  accurate  than 
those  produced  from  a  complete  enumeration.  In  addi- 
tion, the  sampling  approach  produces  estimates  much 
more  quickly  and  less  expensively  than  a  complete 
enumeration. 

Random  Sampling  Error 

In  this  section  we  will  examine  more  closely  the  mean- 
ing of  sampling  error  and  present  some  very  basic  mea- 
sures of  sampling  error.  The  formulas  for  estimating 
error  that  are  presented  below  assume  that  a  simple 
random  sample  has  been  drawn  from  a  population  and 
that  one  wants  to  estimate  characteristics  of  the  entire 
population  based  on  the  results  of  the  single  sample. 
Simple  random  sampling  is  the  selection  from  a  com- 
plete list  of  a  population  of  a  subset  of  the  population  so 
that  on  any  given  draw  the  probabilities  of  all  remaining 
individuals  being  selected  are  equal  regardless  of  the 
individuals  previously  selected.  For  example,  assigning 
the  numbers  1  through  1000  to  the  members  of  a  popula- 
tion of  1000  persons  and  then  randomly  selecting  100 
different  numbers  between  1  and  1000  would  result  in  a 
simple  random  sample.  It  should  be  noted  that  simple 
random  sampling  is  but  one  type  of  randomized  sam- 
pling, and  that  while  simple  random  sampling  assumes 
equal  probability  of  selection,  not  all  equal  probability  of 
selection  methods  are  simple  random  sampling. 

Sample  designs  more  complex  than  simple  random 
sampling  are  sometimes  used  and  the  measures  of  sam- 
pling error  associated  with  these  designs  are  also  more 
complex  than  the  ones  presented  here.  In  a  stratified 
sample  all  individuals  are  first  divided  into  groups  or 
categories  and  independent  samples  are  then  selected 
within  each  group  or  stratum.  This  method  may  be  used 
to  assure  that  members  of  a  certain  subgroup  are 
included  in  the  sample,  or  if  the  strata  are  relatively 
homogeneous  fewer  cases  may  be  required  to  achieve  a 
given  degree  of  accuracy.  In  cluster  sampling,  clusters  or 
groups  of  elements  are  sampled  rather  than  individual 
elements.For  example,  the  census  tracts  in  a  city  might 
be  randomly  sampled  and  then  all  of  the  persons  in  the 
selected  tracts  would  comprise  the  sample.  Cluster  sam- 
ples yield  greater  sampling  errors  than  simple  random 
samples  of  the  same  size,  but  the  costs  associated  with 
cluster  sampling  are  usually  considerably  less.  The  gen- 
eral problem  is  essentially  that  of  balancing  cost  and 
efficiency.  For  large-scale  surveys  these  more  complex 


designs  are  often  desirable,  and  one  should  consult  a 
sampling  text  or  sampling  specialist  for  details  and  assist- 
ance (3). 

A  basic  concept  involved  in  the  determination  of 
sampling  error  is  that  of  a  sampling  distribution.  Consider 
a  population  from  which  we  wish  to  draw  a  random 
sample  in  order  to  estimate  the  population  mean,  Xp,  for 
example,  average  number  of  physician  visits  per  year. 
Assume  that  this  population  has  a  standard  deviation  of 
SDp,  which  is  a  measure  of  the  dispersion  of  the  individ- 
ual numbers  of  physician  visits  around  the  population 
average  or  mean.  We  would  use  the  mean  of  the  sample, 
X5,  to  estimate  the  population  mean.  If  a  very  large 
number  of  random  samples  were  drawn  from  this  popu- 
lation, the  distribution  of  the  means  of  these  samples 
would  form  the  sampling  distribution  of  the  mean. 
There  is  a  "central-limit  theorem"  in  statistics  that  states: 
If  repeated  random  samples  of  size  N  are  drawn  from  a 
normally-distributed  population,  with  mean  Xp  and 
standard  deviation  SDp,  the  sampling  distribution  of 
sample  means  will  be  normally-distributed,  with  mean 


Xp  and  standard  deviation  SDp/\/N .  This  standard  devi- 
ation of  the  sampling  distribution  is  referred  to  as  the 
standard  error  of  the  estimate  Xj.  This  theorem  tells  us 
that  the  larger  the  sample  size  selected,  the  smaller  the 
standard  error,  i.e.,  the  more  the  sample  means  will 
cluster  around  the  true  population  mean.  Further,  to  cut 
the  standard  error  in  half  we  need  to  quadruple  N.  Also, 
the  more  homogeneous  the  population  is  to  begin  with 
(the  smaller  the  value  of  SDp)  the  smaller  the  standard 
error  and  thus  the  greater  the  clustering  of  sample 
means  about  the  population  mean.  (4)  Figure  2  por- 
trays the  relationship  between  the  population  distribu- 
tion and  two  possible  sampling  distributions. 

FIGURE  2: 

Comparison  of  Population  Distribution  and  Normal 
Sampling  Dlstdbutlons  for  Different  sized  Samples 
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A  "law  of  large  numbers,"  closely  related  to  the 
central-limit  theorem,  states:  If  repeated  random  sam- 
ples of  size  N  are  drawn  from  any  population  (of  what- 
ever form)  having  a  mean  Xp  and  a  standard  deviation 
SDp ,  then  as  N  becomes  large  the  sampling  distribution 
of  sample  means  approaches  normality  as  a  limit,  with 
mean  Xp  and  standard  deviation  SDp/\/N  .  This  theo- 
rem says  that  "no  matter  how  unusual  a  distribution  we 
start  with,  provided  N  is  sufficiently  large,  we  can  count 
on  a  sampling  distribution  which  is  approximately  nor- 
mal. Since  it  is  the  sampling  distribution,  and  not  the 
population,  which  will  be  used  in  significance  tests,  this 
means  that  whenever  N  is  large  we  can  completely  relax 
the  assumption  about  the  normality  of  the  population 
and  still  make  use  of  the  normal  curve  in  our  tests" 
(5).  Both  the  central-limit  theorem  and  the  law  of  large 
numbers  assume  that  simple  random  samples  have  been 
drawn,  and  are  less  appropriate  with  more  complex 
sample  designs. 

Many  times  when  we  want  to  compute  the  standard 
error  of  a  sample  estimate  (in  this  case  the  estimate  is  of 
the  mean)  the  standard  deviation  in  the  population  is  not 
known.  Therefore  the  sample  standard  deviation  is  fre- 
quently used  as  an  estimate  of  the  population  value. 
Thus  the  estimate  of  the  standard  error  would  become 
SDs/y/N  . 

We  are  now  prepared  to  add  a  margin  of  error  to  the 
estimate  of  average  annual  number  of  physician  visits 
derived  from  one  sample  from  the  population.  We  know 
that  in  a  normal  distribution  a  certain  percent  of  the 
cases  lie  within  one  standard  error  of  the  mean,  a  larger 
percent  within  two  standard  errors,  etc.  For  example,  we 
know  that  95  percent  of  the  cases  lie  within  1 .96  standard 
errors  of  the  mean.  Therefore,  if  we  put  a  "confidence 
interval"  of  1.96  standard  errors  around  the  sample  esti- 
mate of  the  mean,  the  population  mean  falls  within  this 
interval  95  out  of  100  samples  on  the  average  (see  Figure 
3).  The  flip-side  of  this  coin  is,  of  course,  that  5  times  out 
of  100,  on  the  average,  this  interval  does  not  include  the 
population  mean.  This  95  percent  confidence  level  is 
frequently  used  (or  p  =  .05,  i.e.,  the  probability  of  being 
wrong  is  5  percent),  but  for  99  percent  confidence,  for 
example,  a  wider  interval  of  2.57  standard  errors  should 
be  used. 

Let  us  take  a  numerical  example.  Assume  that  a  sample 
of  500  persons  is  taken  from  a  population  and  that  the 
average  number  of  physician  visits  per  year  for  this  sam- 
ple is  5.1.  Assume  that  the  sample  standard  deviation  is 
2.2  visits.*  The  estimate  of  the  standard  error  is  therefore 
2.2 /vSoo^  .098,  and  we  can  say  that  for  95  of  100  samples 
the  population  mean  lies  between  5.1  ±  1 .96  (.098)  =  5.1  ± 
.192,  or  that  we  are  95  percent  certain  that  the  average 
number  of  physician  visits  in  the  population  is  between 
4.91  and  5.29.  This  error  margin  is  based  only  on  a  con- 
sideration of  sampling  error  and  assumes  that  nonsam- 
pling  errors  have  been  eliminated. 


RCURE  3: 

Comparison  of  confidence  Intervals  with  the  Sampling 

Distribution  of  the  Mean,  Showing  Why  95  Percent 
Confidence  Intervals  Include  Xp  95  Percent  of  the  Time 
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where    s  =  standard  error 

x-i  ^  mean  of  sample  1 
X2  ^  mean  of  sample  2 
Xp  =  population  mean 

(adapted  from  Blalock,  ref.  (2),  p.  159) 


In  the  example  above  we  were  estimating  a  numerical 
average,  but  frequently  one  will  want  to  estimate  a  popu- 
lation proportion  (p),  for  example  the  proportion  or 
percent  of  persons  who  smoke.  The  process  for  estimat- 
ing the  standard  error  in  this  case  is  very  similar  to  the 
one  shown  above.  The  standard  deviation  of  a  propor- 
tion is  simply  \/pq^ ,  where  q  is  7-p.  The  standard  error 
(which  is  the  standard  deviation  of  the  sampling  distribu- 
tion)  is  \fpq~/\fN  ,  or  V pq/H  where  N  is  the  sample 
size.  Let's  assume  that  we  again  use  the  sample  standard 
deviation  as  an  estimate  of  the  population  standard  devi- 
ation, and  that  the  proportion  of  persons  who  smoke  in 
the  sample  is  ps  =  .41 .  The  estimate  of  the  standard  error, 
in  the  case  of  a  sample  of  500,  is  therefore 


V41  (.59)  /500  =  .022  . 

We  can  then  say  with  a  five  percent  chance  of  being 
wrong  that  the  population  proportion  is  .41  ±  1 .96  (.022), 
or  between  .367  and  .453.  This  calculation  of  the  confi- 
dence interval  is  based  on  the  normal  approximation  to 
the  binomial  distribution,  and  assumes  that  both  Hp  and 
H  (1-p),  or  Nq,  are  greater  than  10.  The  confidence 
interval  of  a  sample  proportion  is  frequently  used 
because  of  its  relative  simplicity. 

•The  standard  deviation  is  a  measure  of  dispersion  of  the  cases  around 
the  mean  and  is  computed  as  the  square  root  of  the  sum  of  the  squared 
deviations  of  each  case  from  the  mean  divided  by  the  number  of  cases 
minus  one.  In  this  example  this  would  be 


SD,= 


=  2.2. 


500-1 


It  should  be  noted  that  nowhere  in  this  discussion  of 
standard  errors  has  the  size  of  the  population  from 
which  the  sample  is  drawn  been  considered.  If  sampling 
is  done  with  replacement,  i.e.,  after  each  draw  the  ele- 
ment drawn  is  replaced  and  has  a  chance  of  being 
chosen  again,  then  a  sample  of  100,  for  example,  will 
have  exactly  the  same  standard  error  whether  it  is  drawn 
from  a  population  of  1000  o/ 100,000  or  100,000,000.  Most 
of  the  time,  however,  we  draw  a  sample  without 
replacement  so  that  no  individual  is  given  a  chance  to  be 
drawn  a  second  time.  In  these  cases  a  "finite  population 
correction"  should  technically  be  multiplied  times  the 
standard  error  as  an  adjustment  to  the  formulas  above. 
This  correction  is  \Jl-{N/M)  where  N  is  the  sample  size, 
M  is  the  size  of  the  total  population,  and  N/M  is  the 
"sampling  fraction."  It  can  quickly  be  seen  that  this 
correction  factor  is  negligible  unless  the  sample  is  a  large 
proportion  of  the  population.  A  sample  of  500  out  of  a 
population  of  320,000  (Wake  County,  for  example) 
would  produce  a  correction  of  .9992  to  the  standard 
error,  while  a  sample  of  500  out  of  6,000,000  (N.C.  popu- 
lation) would  produce  a  corrertion  faaor  of  .99996.  Even 
if  the  sample  were  half  of  the  population  the  correction 
factor  would  be  .71  and  the  standard  error  would  thus  be 
nearly  three-fourths  of  that  obtained  if  the  same  size 
sample  were  drawn  from  an  infinite  population.  Ignor- 
ing the  finite  population  correction  will  give  a  picture  of 
less  precision  than  there  actually  is. 

This  section  has  covered  a  lot  of  material  in  a  few 
paragraphs  and  yet  just  touches  the  surface  of  the  litera- 
ture on  sampling  error.  We  hope  that  it  will  nevertheless 
give  the  reader  a  rudimentary  understanding  of  the  the- 
ory of  sampling,  and  serve  as  a  basis  for  the  guidelines  for 
selecting  a  simple  random  sample  that  are  presented  in 
the  next  section. 

How  to  Select  a  Simple  Random  Sample 

Before  selecting  a  random  sample  one  must  first  know 
what  is  a  desirable  sample  size  for  the  project  at  hand. 
What  is  desirable,  however,  is  not  always  feasible.  If  one 
has  a  maximum  budget  to  interview  a  sample  of 
respondents,  for  example,  then  the  sample  size  is  simply 
the  total  budget  divided  by  the  cost  per  interview,  unless 
it  is  determined  that  a  smaller  sample  size  will  produce 
an  acceptable  margin  of  error. 

To  select  a  sample  size,  we  must  first  determine  two 
criteria:  margin  of  error  and  level  of  confidence.  In  the 
case  of  estimating  a  proportion  (or  percent),  for  exam- 
ple, one  might  require  that  the  estimate  from  the  sample 
be  within  plus  or  minus  5  percentage  points  of  the  "true" 
population  value  (margin  of  error)  for  95  out  of  100 
samples  (level  of  confidence).  As  we  reduce  the  margin 
of  error  or  increase  the  level  of  confidence,  a  larger 
sample  size  is  required.  A  distinction  should  be  made 


between  a  margin  of  error  in  terms  of  absolute  percen- 
tage points  versus  a  certain  relative  percent  of  the  esti- 
mate. For  example,  with  a  margin  of  error  of  five  abso- 
lute percentage  points  and  a  sample  estimate  of  20%  (or 
a  proportion  of  .20),  we  would  be,  with  some  degree  of 
confidence,  certain  that  the  population  percentage  falls 
between  15  and  25.  A  relative  margin  of  error  of  5  per- 
cent of  the  estimate  of  20  percent,  however,  would  be 
plus  or  minus  1  percentage  point,  and  a  much  larger 
sample  size  would  be  required  to  ensure  with  a  specified 
degree  of  certainty  that  the  population  percent  is 
between  19  and  21.  For  an  estimate  that  is  not  a  propor- 
tion, such  as  average  number  of  physician  visits  per  year, 
the  margin  of  error  is  frequently  expressed  in  terms  of  a 
certain  percentage  of  the  average  (or  mean  value). 

Let  us  assume  that  one  wants  to  estimate  a  percent 
(proportion)  so  that  the  population  value  lies  within  3 
percentage  points  of  the  sample  percent  99  times  out  of 
100.  If  the  sampling  distribution  is  normal,  the  popula- 
tion percent  lies  within  2.57  standard  errors  of  the  sample 
percent  in  99  percent  of  all  samples.  (This  number  can  be 
derived  from  a  table  of  areas  under  the  normal  curve, 
which  is  found  in  most  standard  statistical  texts.)  We 
tf\erefore  want  2.57  standard  errors  to  equal  .03.  As  was 
shown  before,  the  standard  error  of  a  proportion  is 
y/pq/N  ,  but  in  this  case  we  need  an  estimate  of  the 
population  value  of  p  before  the  sample  is  drawn.  If  no 
estimate  is  available,  .5  is  frequently  used  since  this  value 
of  p  will  produce  the  largest  standard  error,  and  thus  the 
resulting  sample  size  will  be  larger  than  that  needed  for 
any  other  value  of  p.  The  next  step  is  to  solve  for  N 
where:  2.57  yJ.5(.5)/N  =  .03,  or  -J.IS/N  =  .01167,  or  N  = 
1835.  A  sample  size  of  1835  would  produce  an  estimate, 
within  plus  or  minus  .03  of  a  proportion  of  .5  with  a  99 
percent  level  of  confidence.  The  reader  may  wish  to 
verify  the  required  sample  sizes  under  the  different 
combinations  of  criteria  that  are  shown  in  Table  1.  Keep 
in  mind  that  a  95  percent  level  of  confidence  assumes 
plus  or  minus  1.96  standard  errors  if  the  sampling  distri- 
bution is  normal. 


TABLE  1 

Sample  Sizes  Required  for  Different  Combinations  of 
Precision,  Confidence  Level,  and  Value  of  P 

Precision  (±) 
Value  of  P         Confidence  Level         .05  .03  .01 


.5 


.95 
.99 

.95 
.99 


138  384  3457 

238  660  5944 

384  1067  9604 

660  1835  16512 


If  one  is  estimating  a  population  mean  (numerical 
average)  the  formula  for  sample  size  is  very  similar.  In  the 
case  of  average  number  of  physician  visits  per  year,  if  we 
wanted  to  estimate  the  population  average  within  plus 
or  minus  .5  visits  (which  would  be  within  10  percent  of  a 
mean  of  5.0)  with  95  percent  confidence,  we  would  solve 
for  N  in  the  equation:  7.96  SDp/\/N  =  .5  .  The  problem, 
of  course,  is  that  we  must  have  an  estimate  of  the  stan- 
dard deviation  in  the  population  (SDp)  before  the  sam- 
ple is  drawn  in  order  to  solve  for  N,  and  this  is  not  always 
available.  As  a  result,  sample  size  is  often  computed 
using  the  formula  for  a  proportion,  frequently  using  the 
conservative  estimate  of  p  =  .5  unless  better  information 
is  available.  Also,  it  may  be  possible  to  estimate  the 
population  standard  deviation  based  on  expert  knowl- 
edge of  the  range,  mode,  and  other  characteristics  of  the 
distribution  (6).  After  determining  a  sample  size  based 
on  formulas  such  as  these,  it  is  usually  a  good  idea  to 
increase  the  sample  size  as  much  as  possible  to  allow  for 
nonresponse  and  other  potential  problems.  Again,  it  is 
assumed  in  the  calculations  above  that  a  simple  random 
sample  will  be  drawn. 

If  it  has  been  determined  that  a  sample  size  of  500,  for 
example,  is  required  then  the  process  of  drawing  a  sim- 
ple random  sample  may  be  relatively  easy.  First  we  must 
have  a  list  of  each  individual  (person,  household,  etc.)  in 
the  population  and  be  sure  that  each  individual  is  listed 
only  once.  This  may  be  impossible  in  some  cases,  for 
example  if  a  sample  of  all  nonwhites  in  Wake  County 
were  desired,  then  cluster  or  area  sampling  might  be 
required.  To  the  degree  that  persons  omitted  from  a  list 
differ  from  the  rest  of  the  population  with  respect  to  the 
characteristics  being  studied,  a  biased  sample  will  be 
obtained.  Telephone  directories,  for  example,  may  be 
biased  in  that  lower-income  groups  are  likely  to  be 
underrepresented;  also,  persons  with  unlisted  numbers 
may  be  an  unusual  population  group.  In  some  cases  it 
may  be  desirable  to  redefine  the  population  to  conform 
to  the  list  that  is  available.  Assuming,  however,  that  a 
complete  list  has  been  obtained,  one  would  first  asso- 
ciate a  number  with  each  position  on  the  list,  if  there 
were  10,000  individuals  on  the  list,  then  for  a  sample  size 
of  500  one  would  randomly  choose  500  numbers  between 
1  and  10,000  and  the  individuals  corresponding  to  these 
numbers  would  fall  in  the  sample.  If  numbers  are  chosen 
that  have  already  been  selected,  these  repetitions  are 
omitted  until  we  have  500  individuals  in  the  sample. 
Random  numbers  may  be  chosen  from  a  random 
number  table,  or  some  calculators  and  most  computers 
will  generate  random  numbers. 

If  a  list  is  extremely  long,  or  if  a  sample  is  to  be  drawn 
from  administrative  records  or  files,  systematic  sampling 
may  be  much  easier  than  the  procedure  described 
above.  In  systematic  sampling,  instead  of  using  a  table  of 


random  numbers  one  simply  goes  down  a  list  taking 
every  kth  individual,  starting  with  a  randomly  selected 
case  among  the  first  k  individuals.  Thus,  if  one  wanted  to 
select  a  sample  of  180  persons  from  a  list  of  3600,  we 
would  take  every  twentieth  in  the  list.  The  first  choice, 
however,  must  be  determined  randomly.  If  the  number 
5  were  selected  between  1  and  20,  the  sample  would 
consist  of  individuals  numbered  5,  25,  45,  65,  etc.  If  the 
ordering  used  in  compiling  the  list  or  in  a  set  of  records 
can  be  considered  to  be  essentially  random  with  respect 
to  the  variables  being  measured,  then  the  standard  error 
for  a  systematic  sample  will  be  equivalent  to  that  for  a 
simple  random  sample.  Many  lists  or  files  are  alphabeti- 
cal in  order,  but  in  most  cases  alphabetical  ordering  is 
irrelevant  to  the  variables  being  studied.  Systematic 
sampling  may  result  in  a  standard  error  larger  than  that 
for  a  simple  random  sample  of  the  same  size  if  the 
individuals  have  been  ordered  so  that  a  trend  occurs  or  if 
the  list  has  some  periodic  or  cyclical  characteristic  that 
corresponds  to  the  sampling  fraction.  (7) 

Synopsis 

The  concept  of  random  errors  of  measurement  has 
been  presented,  in  the  context  of  sampling  from  a  popu- 
lation. Formulas  were  presented  for  calculating  these 
errors  of  measurement  in  terms  of  the  standard  error  of 
estimates  of  means  and  proportions.  Procedures  for 
selecting  simple  random  and  systematic  samples  were 
also  described.  Part  2  of  this  statistical  primer  will  discuss 
errors  in  measures  based  on  a  "complete  count"  and 
present  formulas  for  calculating  standard  errors  of  sim- 
ple rates  and  of  the  difference  between  two  rates. 

This  paper  only  touches  on  the  basics  of  sampling  and 
measurement  error,  and  the  reader  interested  in  com- 
puting standard  errors  for  more  complex  measures  or  in 
carrying  out  more  complex  sample  designs  should  con- 
sult the  references  listed  here  or  a  sampling  specialist. 
The  simple  examples  presented  here  are  not  always 
encountered  in  practice.  Frequently  one  will  study  more 
than  one  variable  at  a  time  in  a  sampling  project,  and  this 
may  complicate  the  picture  considerably.  It  is  usually 
best  to  draw  a  sample  size  that  will  allow  adequate  esti- 
mation of  the  variable  with  the  largest  standard  error. 

It  is  hoped  that  this  overview  will  provide  the  reader 
with  enough  background  to  understand  the  general 
problems  involved  in  a  sampling  project  and  to  seek 
further  help  and  consultation  as  necessary. 
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